01. Deep Reinforcement Learning
Deep Reinforcement Learning
INSTRUCTOR NOTE:
Note: \mathcal{R} is the set of all rewards. The reward probability is jointly specified with the transition probability as: p(s', r | s, a) = \mathbb{P}(S_{t+1}=s', R_{t+1}=r|S_t=s, A_t=a)